Robust full waveform inversion of seismic data method and device

ABSTRACT

A method for calculating a velocity model for a subsurface of the earth. The method includes receiving (200) measured seismic data d; calculating (204) predicted seismic data p; selecting (206) a matching filter w that when applied to one of the measured seismic data d or the predicted seismic data p reproduces the other one of the measured seismic data d or the predicted seismic data p; selecting (208) a misfit function J that calculates (1) a distance between the matching filter w and a Dirac Delta function or (2) a travel time shift associated with the measured seismic data; and calculating (218) a new velocity model using the misfit function J, the measured seismic data d, and the predicted seismic data p. The measured seismic data d includes wavefields generated by a seismic source and the wavefields propagate through the subsurface where they are attenuated and reflected, and the attenuated and reflected wavefields are recorded by plural seismic receivers.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a U.S. National Stage Application of InternationalApplication No. PCT/IB2019/050634, filed on Jan. 25, 2019, which claimspriority to U.S. Provisional Patent Application No. 62/648,569, filed onMar. 27, 2018, entitled “ROBUST FULL WAVEFORM INVERSION OF SEISMIC DATAMETHOD AND DEVICE,” and U.S. Provisional Patent Application No.62/713,246, filed on Aug. 1, 2018, entitled “ROBUST FULL WAVEFORMINVERSION OF SEISMIC DATA METHOD AND DEVICE,” the disclosures of whichare incorporated herein by reference in their entirety.

BACKGROUND Technical Field

Embodiments of the subject matter disclosed herein generally relate tomethods and devices for processing seismic data, and more specifically,to applying a robust full waveform inversion (FWI) to seismic dataacquired for a given subsurface of the Earth.

Discussion of the Background

Seismic data acquisition and processing techniques generate a profile(image) of the geophysical structure (subsurface) under the surface ofthe Earth. While this profile does not provide an accurate location foroil and gas reservoirs or other resources, it suggests, to those trainedin the field, the presence or absence of oil and/or gas. Thus, providinga high-resolution image of the subsurface is an ongoing process for theexploration of natural resources, including, among others, oil and/orgas. The FWI is one approach to calculating an image of the subsurface.However, the traditional FWI approaches are plagued by various problemsthat are now discussed.

Full waveform inversion (Virieux and Operto, 2009) aims at producinghigh-resolution subsurface models capable of computing predicted datathat fits the observed seismic waveforms. The FWI is a highly-nonlinearinversion process. Because of this, the subsurface model is iterativelyupdated using a linearized version of it to reduce the mismatch betweenthe predicted and measured (or observed) seismic data. Mathematically, amisfit function is introduced to characterize such mismatch between thepredicted and measured seismic data. Selecting a misfit function is animportant ingredient of the optimization problem: a well-behaved misfitfunction would release the requirement for a good initial velocity modelor usable low-frequency signal in the seismic data and resolve theso-called “cycle skipping” issue.

Over the past decade, the least square “l₂” norm was widely used as amisfit function for its simplicity and potential for high-resolutionmodels. However, this misfit function suffers from severe cycle skippinglimitations. The conventional “l₂” norm based misfit function issusceptible to local minima if the lowest wavenumber of the initialmodel is inaccurate or usable low-frequency signals are absent from theobserved seismic data. In this respect, note that acquiringlow-frequency signals during seismic data acquisition surveys isdifficult.

One option to overcome such a “cycle skipping” issue is to extend thesearch space to allow seismic data comparisons beyond the“point-to-point” subtraction. An extended function is computed by adeconvolution of the observed and predicted seismic data. If the modeland the simulation are correct, the extended function would be similarto a Dirac Delta function, and the energy will focus to zero lag (time)of the deconvolution function. An optimization problem can be formulatedby measuring this extended function or its attributes' departure from aDirac Delta function, or measuring the energy not residing at zero lag.As the extended function replaces the local, sample by samplecomparison, with a more global comparison by deconvolution, it canresolve the “cycle skipping” issue. However, as the extended function iscomputed using the whole trace of the observed and predicted seismicdata, the extended function is prone to unwanted cross-talk of differentevents between the observed and predicted seismic data during thecomputation process.

Currently, new and more advanced misfit functions were proposed, suchas: the matching filter based misfit function (Van Leeuwen and Mulder,2008, 2010; Luo and Sava, 2011; Warner and Guasch, 2016; Huang et al.,2017; Debens et al., 2017), the optimal transport misfit function(Metivier et al., 2016; Yang et al., 2018; Yang and Engquist, 2018),etc. However, these new misfit functions, including the deconvolutionprocess, still face issues. In complex reflectivity regions, there isconsiderable overlap of the seismic events that may cause cross-talk inthe matching filter and hamper its effectiveness. In fact, the extensionfunction given by the deconvolution can be a drawback in this case. As aresult, the traditional methods are sometimes limited to diving waves(Huang et al., 2017) and they suffer when there are strong multiples(Debens et al., 2017) present in the observed seismic data.

Current implementations of the optimal transport process in the FWI arelimited to a measure of the distance between the predicted data and theobserved data (Metivier et al., 2016; Yang et al., 2018; Yang andEngquist, 2018). However, a requirement of the optimal transport theoryindicates that the variables to be measured should be a statisticaldistribution, i.e., the distribution should have only positive valuesand its integration over time should be equal to 1. As the recordedseismic data includes traces that are oscillatory in nature and they donot fulfill the requirements of a statistical distribution, transformingthe seismic data into a distribution (Yang and Engquist, 2018; Qiu etal., 2017) would alter either the amplitude or the phase of its traces,which would make the application of the inversion process unstable andinaccurate.

The current optimization of the matching filter to penalize energyresiding away from zero lag relies on the Born approximation in whichthe adjoint source measures the sensitivity of the objective function tothe modeled data directly (Van Leeuwen and Mulder, 2008, 2010; Luo andSava, 2011; Warner and Guasch, 2016; Huang et al., 2017; Debens et al.,2017). The resulting adjoint source is focused on the amplitude of theenergy away from zero lag, rather then its distance (time). As a result,such methods have an inherent limitation in dealing with broadband data(having a white spectrum), and suffer from slow convergence.

Accordingly, there is a need to provide a FWI process that overcomes theaforementioned deficiencies in data complexity and potential crosstalk,in requirements, and in update procedures.

SUMMARY

According to an embodiment, there is a method for calculating a velocitymodel for a subsurface of the earth. The method includes receivingmeasured seismic data d, calculating predicted seismic data p, selectinga matching filter w that when applied to one of the measured seismicdata d or the predicted seismic data p reproduces the other one of themeasured seismic data d or the predicted seismic data p, selecting amisfit function J that calculates (1) a distance between the matchingfilter w and a Dirac Delta function or (2) a travel time shiftassociated with the measured seismic data, and calculating a newvelocity model using the misfit function J, the measured seismic data d,and the predicted seismic data p. The measured seismic data d includeswavefields generated by a seismic source and the wavefields propagatethrough the subsurface where they are attenuated and reflected, and theattenuated and reflected wavefields are recorded by plural seismicreceivers.

According to another embodiment, there is a method for calculating avelocity model for a subsurface of the earth. The method includesreceiving measured seismic data d, calculating predicted seismic data p,transforming the measured seismic data d and the predicted seismic datap from a time-space domain to a tau-p domain, selecting a matchingfilter w in the tau-p domain, wherein the matching filter is applied inthe tau-p domain to the measured seismic data d to reproduce thepredicted seismic data p, selecting a misfit function J in the tau-pdomain, wherein the misfit function J uses (i) a distance between (1)the matching filter w and (2) a Dirac Delta function or another functionrepresentation of focusing at zero lag or (ii) a travel time shiftassociated with the measured seismic data, and calculating a newvelocity model using the misfit function J, the measured seismic data d,and the predicted seismic data p. The measured seismic data d includeswavefields generated by a seismic source and the wavefields propagatethrough the subsurface where they are attenuated and reflected, and theattenuated and reflected wavefields are recorded by plural seismicreceivers.

According to still another embodiment, there is a computing unit forcalculating a velocity model for a subsurface of the earth. Thecomputing unit includes an interface for receiving measured seismic datad, and a processor connected to the interface. The processor isconfigured to calculate predicted seismic data p, select a matchingfilter w that when applied to the measured seismic data reproduces thepredicted seismic data p, select a misfit function J that calculates (1)a distance between the matching filter w and a Dirac Delta function or(2) a travel time shift associated with the measured seismic data, andcalculate a new velocity model using the misfit function J, the measuredseismic data d, and the predicted seismic data p. The measured seismicdata d includes wavefields generated by a seismic source and thewavefields propagate through the subsurface where they are attenuatedand reflected, and the attenuated and reflected wavefields are recordedby plural seismic receivers.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of the specification, illustrate one or more embodiments and,together with the description, explain these embodiments. In thedrawings:

FIG. 1 illustrates a source that generates seismic waves which aftergetting reflected by the subsurface, are reflected by one or morereceivers;

FIGS. 2A and 2B illustrate a flowchart of a method for calculating a newvelocity model using a misfit function that penalizes a matching filter;

FIG. 3 is a flowchart of a method for updating a model using a fullwaveform inversion;

FIG. 4A illustrates a common shot gather in the time-space domain andFIG. 4B illustrates the same gather transformed into the tau-p domain;

FIGS. 5A and 5B illustrate a flowchart of a method for calculating a newvelocity model using a misfit function that penalizes a matching filterin a tau-p domain;

FIG. 6 is a flowchart of a method for conditioning a matching filterthat is used in a misfit function during a full waveform inversionprocess; and

FIG. 7 illustrates a computing unit in which one or more of the methodsdiscussed above can be implemented.

DETAILED DESCRIPTION

The following description of the embodiments refers to the accompanyingdrawings. The same reference numbers in different drawings identify thesame or similar elements. The following detailed description does notlimit the invention. Instead, the scope of the invention is defined bythe appended claims. The following embodiments are discussed, forsimplicity, with regard to applying the FWI process to measured seismicdata. However, the invention is not limited to seismic data, but it maybe applied to any type of data.

Reference throughout the specification to “one embodiment” or “anembodiment” means that a particular feature, structure or characteristicdescribed in connection with an embodiment is included in at least oneembodiment of the subject matter disclosed. Thus, the appearance of thephrases “in one embodiment” or “in an embodiment” in various placesthroughout the specification is not necessarily referring to the sameembodiment. Further, the particular features, structures orcharacteristics may be combined in any suitable manner in one or moreembodiments.

According to an embodiment, a Radon transform is introduced fortransforming the recorded seismic data from the time-space domain to aRadon domain to mitigate cross-talk issues. The Radon transform may beadded, for example, into the flow of the matching filter based FWIprocess. In this embodiment, instead of designing an extended function(matching filter) for each trace directly, at first, the recordedseismic data is transformed, for example, transform the shot gathersinto the τ-p domain (where tau is the intercept time and p is the slope)using the Radon transform and then compute, in the Radon domain, theextended function for each trace, indexed by the same p value bydeconvolution. The benefits of the Radon transform are used to separatedifferent events and thus, reduce the cross-talk in the followingdeconvolution stage. Instead of designing the extended function bylooking at a single trace, the process in this embodiment also looks atthe traces nearby and shift them to the tau-p domain by taking thecoherence of the event into consideration. The criterion for asuccessful inversion is still unchanged, i.e., the extended functionshould also be a Delta function in the tau-p domain when the velocity iscorrect.

In order to resolve the issues of the traditional FWI process, insteadof measuring directly the distance between the predicted and measuredseismic data, according to an embodiment, the method would at firstcompute a matching filter based on the predicted data and measuredseismic data and then measure the distance between the resultingmatching filter and the Dirac Delta function. In one embodiment, aprecondition would be used to transform the resulting matching filter tobe a statistical distribution and, because this embodiment does notdirectly modify the predicted or observed seismic data, the phase andamplitude of the seismic signal are preserved and the followinginversion process should be stable and accurate.

In another embodiment, a Rytov approximation is introduced into thematching filter based optimization process. Thus, according to thisembodiment, the method computes an adjoint source using the Rytovapproximation. Using the Rytov approximation, it is possible to mitigatean inherent limitation of the traditional Born approximation that iscurrently used for a band-limited seismic dataset. Note that applyingthe Rytov approximation may take place in the tau-p domain but also inthe time-space domain. For example, it is possible to calculate thematching filter and the misfit function exclusively in the time-spacedomain and also to apply the Rytov approximation to the matching filterin the time-space domain.

According to an embodiment, a new FWI process is introduced and thisprocess includes at least one of the following new features: (1) using aRytov approximation to develop an adjoint source and applying the fullwaveform inversion in the time-space domain for calculating a newvelocity model, (2) applying a Radon transform to the seismic data forthe application of the full waveform inversion in the Radon domain, and(3) selecting a new misfit function for measuring a distance between theresulting matching filter and the Dirac Delta function. Note that thethird feature (3) can be applied together with the first or secondfeatures. Before discussing in more detail these features, a briefreview of the matching filter based FWI process is believed to be inorder.

In the FWI process, a forward modeling, which is described by a waveequation operator L, is used to generate the predicted seismic data p(t;x_(r); x_(s)). Note that the predicted seismic data may be generatedfrom log data, data from a previous seismic survey, or based on theguess of one experienced in this field. The wave equation operator L isapplied to the measured seismic data to obtain a source function f(x,t),as follows:L(m)u _(S)(t,x,x _(S))=ƒ(x,t)  (1),where x and x_(s) are the space coordinates and source positionrespectively, L(v) denotes the wave equation operator, m describes themodel parameters, f(x; t) is the source function, and u_(S) is thesource wavefield. The predicted data p(t; x_(r); x_(s)) may then besampled from the source wavefield by a sampling matrix W, i.e., p(t;x_(r); x_(s))=Wu_(S). A misfit function J is then selected, e.g., theleast square “l₂” norm misfit function which is used to fit the observeddata d(t; x_(r); x_(s)). One example of the misfit function J is asfollows:

$\begin{matrix}{{J = {\sum\limits_{x_{r}}{\sum\limits_{x_{s}}{\sum\limits_{t}\left\lbrack {{p\left( {t,x_{r},x_{s}} \right)} - {d\left( {t,x_{r},x_{s}} \right)}} \right\rbrack^{2}}}}},} & (2)\end{matrix}$where x_(r) are the coordinates for the seismic receiver position andx_(s) are the coordinates for the source position as illustrated in FIG.1 . Note that FIG. 1 shows a land seismic survey system 100 having onesource S at the surface 102 of the earth and plural receivers R alsolocated at the surface of the earth. The measured seismic data dincludes wavefields generated by the seismic source S and the wavefieldsthat propagate through the subsurface, where they are attenuated andreflected, and the attenuated and reflected wavefields are recorded bythe seismic receivers R. However, the method discussed herein also worksfor plural sources S distributed over the surface of the earth, or forplural sources being buried under the ground, or plural receivers beingburied under the surface of the earth, or the sources being towed by avessel in water while the receivers are towed on a streamer behind thesame vessel or different vessels or the receivers are ocean bottomcables distributed on the ocean's bottom. FIG. 1 also shows that seismicwaves 104 and 106 (e.g., sound waves or electromagnetic waves) emittedby the source S are reflected at various interfaces 108 and 110,respectively, in the subsurface. The interfaces 108 and 110 areboundaries between various layers 112A to 112C of the earth.

The traditional least square “l₂” norm misfit function measures themismatch between two seismic traces by their “sample by sample”difference. However, as already discussed above, this misfit function ishighly prone to cycle skipping. In practice, the application of the “l₂”norm misfit function would either need kinematically accurate initialmodels or very low-frequency seismic data to guarantee globalconvergence to the desired accurate model.

A matching filter based FWI process uses an extended function (ormatching filter) w(t; x_(r); x_(s)) for each trace so that the followingequation is satisfied:w(t,x _(r) ,x _(s))*d(t,x _(r) ,x _(s))=p(t,x _(r) ,x _(s)),  (3)where * denotes the convolution operation. The matching filter w can be2D or 3D over time and space, if needed. Equation (3) is a linearequation and the computation of the matching filter w is a well-resolvedproblem in signal processing. The matching filter can be computed, forexample, either in the time domain or in the frequency domain, asfollows:

$\begin{matrix}{{{w\left( {t,x_{r},x_{s}} \right)} = {\mathcal{F}^{-}\left\lbrack \frac{{\mathcal{F}\left\lbrack {p\left( {t,x_{r},x_{s}} \right)} \right\rbrack}\overset{\_}{\mathcal{F}\left\lbrack {d\left( {t,x_{r},x_{s}} \right)} \right\rbrack}}{{{\mathcal{F}\left\lbrack {d\left( {t,x_{r},x_{s}} \right)} \right\rbrack}\overset{\_}{\mathcal{F}\left\lbrack {d\left( {t,x_{r},x_{s}} \right)} \right\rbrack}} + \epsilon} \right\rbrack}},} & (4)\end{matrix}$where

and

⁻ denote a Fourier transform between time and frequency and an inverseFourier transform, respectively, the overline denotes the complexconjugation operator, and ϵ is a small positive number to avoid divisionwith zero.

When the velocity model (the model that describes the velocity of soundwaves 104 and 106 in each layer 112 of earth in FIG. 1 ) is correct, thematching filter w is reduced to a Dirac Delta function δ, or moregenerally, the energy focuses near zero lag. Thus, in this embodiment,it is possible to define an optimization problem by applying a penaltyfunction to the matching filter w, to penalize the energy at non-zerotime lag, to measure a discrepancy between the observed seismic data dand the predicted seismic data p, for an incorrect velocity model asfollows:

$\begin{matrix}{{J = {\frac{1}{2}{\sum\limits_{x_{r}}{\sum\limits_{x_{s}}{\sum\limits_{t}\left\lbrack {{P(t)}{w\left( {t,x_{r},x_{s}} \right)}} \right\rbrack^{2}}}}}},} & (5)\end{matrix}$where P(t) is the penalty function. The time lag noted above appearswhen one considers the measured data d and the predicted data p. Thesetwo sets of data, when used in equation (2) with the matching filter w,may have their time “t” not matching to each other, i.e., the time ofone set of data lagging the time of the other set of data. Thus, theterm “zero time lag” means that there is a perfect time match betweenthe times of the two sets of data, i.e., the matching filter is a deltafunction, which when applied to the measured seismic data d results inthe predicted seismic data p. However, if there is time mismatch betweenthese two sets of data, then a non-zero time lag is present.

In one embodiment, the penalty function P(t) may be a linear function.For example, the penalty function may be P(t)=|t|. Other values for thepenalty function may be used. Note that in this embodiment the penaltyfunction penalizes the matching filter w at non-zero time lag while inthe art, the penalty function penalizes the data itself, i.e., themeasured and predicted seismic data. This different approach ofpenalizing the matching filter and not the seismic data is believed tobe novel.

Another novel feature of the FWI process is the use of the Rytovapproximation. Using this approximation, the matching filter may beexpressed as a sum of shifted weighted Dirac Delta functions, as follow:

$\begin{matrix}{{{w\left( {t,x_{r},x_{s}} \right)} = {{\sum\limits_{i}{w_{i}\left( {t,x_{r},x_{s}} \right)}} = {\sum\limits_{i}{{a_{i}\left( {x_{r},x_{s}} \right)}{\delta\left\lbrack {t - {\tau_{i}\left( {x_{r},x_{s}} \right)}} \right\rbrack}}}}},} & (6)\end{matrix}$where w_(i) are the matching filter coefficients, a_(i) is an amplitudethat does not vary with the time, but only with the positions x_(r) andx_(s) of the receivers and sources, δ is the Dirac Delta function, andτ_(i) is the time sampling as a function of source and receiverlocation, and thus, it depends only on the positions x_(r) and x_(s),and it could be the tau coordinate (intercept time) in the tau-p domainas a function of a space coordinate (like source or receiver) and slopep.

In this definition, the matching filter coefficients w_(i) aredetermined by 2 parameters (a_(i); τ_(i)), and thus, the filtercoefficients w_(i) can change in two ways, either because of theamplitude a_(i) or because of the tau component τ_(i), which meansshifting the time lag. Changing the amplitude a_(i) corresponds to theconventional Born approximation, which is currently used by the existingimplementations. Changing the tau component τ_(i) corresponds to theRytov approximation and this feature is new to the matching filterimplementation. Thus, according to this embodiment, the tau componentτ_(i) is varied to implement the Rytov approximation.

For the FWI process, as discussed above, there is a mismatch between themeasured seismic data d and the predicted seismic data p. This mismatchcan be accounted for in the FWI process by using the notion of an“adjoint source” s, i.e., a source that is responsible for a measure ofthe mismatch. This adjoint source is not a physical source. It is rathera mathematical construct that is responsible for the difference betweenthe measured data and the predicted data and it can be used withwavefields. In this embodiment, the adjoint source is back-propagated toupdate the model.

Using the Rytov approximation, the adjoint source s is described in oneembodiment by:

$\begin{matrix}{{{\delta\;{s\left( {t,x_{r},x_{s}} \right)}} = {\sum\limits_{i}{\frac{\partial\tau_{i}}{\partial p}\frac{\partial J}{\partial\tau_{i}}}}},} & (7)\end{matrix}$where p is the predicted seismic data, J is the misfit function, andτ_(i) is the tau coordinate. If equation (6) is substituted intoequation (5), and the penalty function is considered to be P(t)=|t|,then the misfit function J becomes:

$\begin{matrix}{{J = {\frac{1}{2}{\sum\limits_{x_{r}}{\sum\limits_{x_{s}}{\sum\limits_{t}{{a_{i}^{2}\left( {x_{r},x_{s}} \right)}{\tau_{i}^{2}\left( {x_{r},x_{s}} \right)}}}}}}}.} & (8)\end{matrix}$

The adjoint source can be rewritten, based on equation (8), as:

$\begin{matrix}{{{\delta\;{s\left( {t,x_{r},x_{s}} \right)}} = {\sum\limits_{i}{\frac{\partial{\tau_{i}\left( {x_{r},x_{s}} \right)}}{\partial p}{\tau_{i}\left( {x_{r},x_{s}} \right)}{a_{i}^{2}\left( {x_{r},x_{s}} \right)}}}},} & (9)\end{matrix}$

Using the Rytov approximation of the matching filter (see equation (6))in equation (3), the convolution of the matching filter with themeasured seismic data d becomes:

$\begin{matrix}{{{d\left( {t,x_{r},x_{s}} \right)}*{\sum\limits_{i}{{a_{i}\left( {x_{r},x_{s}} \right)}{\delta\left\lbrack {t - {\tau_{i}\left( {x_{r},x_{s}} \right)}} \right\rbrack}}}} = {{p\left( {t,x_{r},x_{s}} \right)}.}} & (10)\end{matrix}$

Using the rule of derivative for implicit functions, for a connectionfunction F(u, v)=0, the derivative of u with v can be written as:

$\begin{matrix}{{\frac{du}{dv} = {- \frac{\frac{\partial{F\left( {u,v} \right)}}{\partial v}}{\frac{\partial{F\left( {u,v} \right)}}{\partial u}}}}.} & (11)\end{matrix}$

Applying the rule of derivative shown in equation (11) while consideringequation (10) to be the connection function, results in:

$\begin{matrix}{{\frac{\partial{\tau_{i}\left( {x_{r},x_{s}} \right)}}{\partial p} = {{- \frac{1}{d^{\prime}\left( {{t - \tau_{i}},x_{r},x_{s}} \right)}} = {- \frac{d^{\prime}\left( {{t - \tau_{i}},x_{r},x_{s}} \right)}{{{d\left( {{t - \tau_{i}},x_{r},x_{s}} \right)}}_{2}^{2}}}}}.} & (12)\end{matrix}$

Substituting equation (12) into equation (9) results in the followingexpression for the adjoint source:

$\begin{matrix}{{{\delta{s\left( {t,x_{r},x_{s}} \right)}} = {- {\sum\limits_{i}{\frac{d^{\prime}\left( {{t - \tau_{i}},x_{r},x_{s}} \right)}{{{d\left( {{t - \tau_{i}},x_{r},x_{s}} \right)}}_{2}^{2}}{\tau_{i}\left( {x_{r},x_{s}} \right)}{a_{i}^{2}\left( {x_{r},x_{s}} \right)}}}}}.} & (13)\end{matrix}$

Having the adjoint source given by equation (13), the adjoint statewavefield u_(R) can be computed as follows:

$\begin{matrix}{{{{L^{*}(m)}{u_{R}\left( {t,x,x_{S}} \right)}} = {\sum\limits_{x_{r}}{\delta{s\left( {t,x_{r},x_{s}} \right)}}}},} & (14)\end{matrix}$where x=(x, y, z) is the vectorial coordinate of the subsurface and L*is the adjoint wave equation operator. Note that L may represent anywave equation operator that is applicable to a wave. For example, in oneapplication, the operator L may be given as follows:

$\begin{matrix}{{\frac{\partial^{2}{u_{s}\left( {t,x,x_{S}} \right)}}{\partial t^{2}} = {{\Delta{u_{s}\left( {t,x,x_{S}} \right)}} + {{\delta\left( {x - x_{s}} \right)}{f(t)}}}},{and}} & (15)\end{matrix}$

$\begin{matrix}{{\frac{\partial^{2}{u_{R}\left( {t,x,x_{S}} \right)}}{\partial t^{2}} = {{\Delta{u_{R}\left( {t,x,x_{S}} \right)}} + {\sum\limits_{x_{r}}{\delta{s\left( {t,x_{r},x_{s}} \right)}}}}},} & (16)\end{matrix}$where Δ is the Laplace operator. Equations (1) and (15) would be solvedforward in time, from t=0 to t=T_(max) while equations (14) and (16)would be solved backward in time (reverse time) from t=T_(max) to t=0.

The gradient g with respect to the model parameter m may be calculatedas:

$\begin{matrix}{{g = {\sum\limits_{t}{\sum\limits_{x_{s}}{\frac{\partial{L(m)}}{\partial n}{u_{s}\left( {t,x,x_{S}} \right)}{u_{R}\left( {t,x,x_{S}} \right)}}}}}.} & (17)\end{matrix}$

Then, in one embodiment to find a maximum or minimum of the model forparameter m, a linear search using equation (1) is performed for findinga step γ that guarantees a sufficient decrease of the misfit function J,so that:J(m−γg)<J(m),  (18)so that the new model parameter m is given by:m _(new) =m−γg.  (19)

The above equations are iteratively computed until a pre-determinedstopping criteria is met, e.g., repeat the calculations up to a maximumnumber of iterations or the misfit function value J(m−γg) drops to acertain value. In one embodiment, the model is the velocity model.

The FWI process discussed above is now illustrated with regard to FIGS.2A and 2B. FIG. 2A is a flowchart that illustrates a method forcalculating a velocity model for a given set of seismic data d.According to this method, in step 200, a set of measured seismic data dis received. The set can be land data, marine data, log data, etc. Instep 202, a starting velocity model v is selected. This initial orstarting model may be derived from other seismic data, log data, or maybe a guess. Using the starting velocity model v, the predicted data p iscalculated in step 204 using a wave equation operator L and the startingvelocity model v (see also equation (1)). A matching filter w isselected in step 206 (see equation (3)) and a misfit function J isselected in step 208 (see equation (5)). The misfit function J is basedon the matching filter w and on a penalty function P. As previouslydiscussed, the penalty function penalizes the matching filter when thematching filter deviates from a Delta Dirac function. In step 210, aRytov approximation (see equation (6)) is applied to the matching filterw. In step 212, the adjoint source is expressed based on the misfitfunction (see equations (7) and (13)), in step 214 the adjoint statewavefield u_(R) is calculated, for example, based on equation (14), andin step 216 a gradient of the wave field velocity operator L with regardto the model parameter m is calculated, for example, based on equation(15). Based on the calculated gradient g, a global minimum or maximum ofthe misfit function J is calculated, which produces in step 218 the newvelocity model v_(new). This process may be repeated a few times, ifnecessary, until a predetermined condition is met, for example, misfitfunction values is below a given threshold, or the calculations arerepeated a predetermined number of times. In step 220, based on the newvelocity model, an image of the subsurface may be calculated.

FIG. 3 also illustrates the FWI process using the penalty function P(t)and the Rytov approximation. The initial velocity model v is received instep 300. Based on this model and equation (8), calculations areperformed in step 302 for generating the predicted (or synthetic)seismic data 304. The measured seismic data 306 is used together withthe predicted seismic data 304, for calculating in step 308 the adjointsource s. Equations (4), (6), and (7) may be used in step 308 forcalculating the adjoint source. In step 310, the inversion is applied tothe adjoint source to calculate the gradient g (using equation (15)) andthen a linear search is performed (using equations (13) and (16)) forupdating the velocity model to a new velocity model v_(new). Step 310may involve the use of equations (8)-(11). If the updated model v_(new)converges in step 312, i.e., the misfit function has reached a minimumor maximum, this model is considered in step 314 as being the finalmodel. If this is not the case, the process returns to step 302 forrepeating the steps noted above.

According to another embodiment, the matching filter discussed above maybe used together with a Radon transform in the tau-p domain as nowdiscussed. The matching filter w computed using equations (3) and (4)works with an entire trace from the measured and predicted seismic dataas input, and would suffer from the cross-talk between different events.In this embodiment, a Radon transform is used to separate the eventsbased on their space-time coherence, i.e., the slope information, beforeapplying the matching filter. This approach leads to a more robustcomputation of the matching filter and as a result, to a cleaner andnoise-free gradient g, which would be beneficial for a more stable andaccurate velocity updating algorithm.

For a common shot gather q(t, x_(r), x_(S)), the Radon transform

, which is a linear operator, maps the seismic data (the gather in thiscase) from the time-space domain (t-x) to the Radon domain (tau-p) asfollows:

$\begin{matrix}{{{\overset{\sim}{q}\left( {\tau,x_{r},x_{S}} \right)} = {{\mathcal{R}\left\lbrack {q\left( {t,x_{r},x_{S}} \right)} \right\rbrack} = {\sum\limits_{h}{q\left( {{\tau + p},x_{r},x_{S}} \right)}}}},} & (20)\end{matrix}$where p is the slope in the tau-p domain, the tilde above the qindicates the Radon domain, and h is a local spatial coordinate (for a2D problem, these vectors would have one component and for a 3D problemthey will have two components). The Radon transform

is applied locally in this embodiment and a constraint may be appliedfor each component |h_(i)|<h_(max), where h_(max) is a given threshold.FIG. 4A illustrates a seismic common shot gather q(t, x_(r),x_(S)) inthe time-space domain (note that the Y axis indicates a distance betweenthe receiver and the source and the X axis indicates a depth, measuredin time, for each measured ray) and FIG. 4B illustrates the same gatherbut in the Radon domain {tilde over (q)}(τ, x_(r), x_(S)).

The adjoint operator of the Radon transform is

^(T), which transforms the data from the tau-p domain back to the t-xdomain, and this operator can be written as:

$\begin{matrix}{{q\left( {t,x_{r},x_{S}} \right)} = {{\mathcal{R}^{T}\left\lbrack {\overset{\sim}{q}\left( {\tau,p,x_{r},x_{S}} \right)} \right\rbrack} = {\sum\limits_{p}{{\overset{\sim}{q}\left( {{t - {p\left( {x_{r} + h} \right)}},x_{r},x_{S}} \right)}.}}}} & (21)\end{matrix}$

For a tau-p domain implementation of the matching filter, both theobserved and the predicted seismic data have to be first transformedinto the tau-p domain using equation (20),

$\begin{matrix}{{{\overset{\sim}{p}\left( {\tau,p,x_{r},x_{S}} \right)} = {\sum\limits_{h}{p\left( {{\tau + {p \cdot \left( {x_{r} + h} \right)}},x_{r},x_{S}} \right)}}},} & (22) \\{{\overset{\sim}{d}\left( {\tau,p,x_{r},x_{S}} \right)} = {\sum\limits_{h}{{d\left( {{\tau + {p \cdot \left( {x_{r} + h} \right)}},x_{r},x_{S}} \right)}.}}} & (23)\end{matrix}$

Then, the matching filter {tilde over (w)}(τ, p, x_(r), x_(s)) isdirectly defined in the tau-p domain, for example, as:{tilde over (w)}(τ,p,x _(r) ,x _(s))*{tilde over (d)}(τ,p,x _(r) ,x_(s))={tilde over (p)}(t,p,x _(r) ,x _(s)),  (24)and it can be calculated in the frequency domain as:

$\begin{matrix}{{\overset{\sim}{w}\left( {\tau,p,x_{r},x_{s}} \right)} = {{\mathcal{F}^{-}\left\lbrack \frac{{\mathcal{F}\left\lbrack {\overset{\sim}{p}\left( {\tau,p,x_{r},x_{s}} \right)} \right\rbrack}\overset{\_}{\mathcal{F}\left\lbrack {\overset{\sim}{d}\left( {\tau,p,x_{r},x_{s}} \right)} \right\rbrack}}{{{\mathcal{F}\left\lbrack {\overset{\sim}{d}\left( {\tau,p,{x_{\gamma r}x_{S}}} \right)} \right\rbrack}\overset{\_}{\mathcal{F}\left\lbrack {\overset{\sim}{d}\left( {\tau,p,x_{r},x_{s}} \right)} \right\rbrack}} + \epsilon} \right\rbrack}.}} & (25)\end{matrix}$In one application, the misfit function in the tau-p domain is thenwritten as a function of the matching filter in the tau-p domain as:

$\begin{matrix}{{J = {\frac{1}{2}{\sum\limits_{x_{r}}{\sum\limits_{x_{s}}{\sum\limits_{t}{\sum\limits_{\tau}\left\lbrack {{P(\tau)}{\overset{\sim}{w}\left( {\tau,p,x_{r},x_{s}} \right)}} \right\rbrack^{2}}}}}}},} & (26)\end{matrix}$where the penalty function P(τ) has the same configuration as thepenalty function P(t) in equation (5). The matching filter {tilde over(w)}(τ, p, x_(r), x_(s)) in the tau-p domain may be written as a sum ofshifted weighted Dirac Delta functions δ,

$\begin{matrix}{{\overset{˜}{w}\left( {\tau,p,x_{r},x_{s}} \right)} = {{\sum\limits_{i}{{\overset{\sim}{w}}_{i}\left( {\tau,p,x_{r},x_{s}} \right)}} = {\sum\limits_{i}{{a_{i}\left( {p,x_{r},x_{s}} \right)}{{\delta\left\lbrack {t - {\tau_{i}\left( {p,x_{r},x_{s}} \right)}} \right\rbrack}.}}}}} & (27)\end{matrix}$

If the penalty function is selected to be P(τ)=|τ|, then the misfitfunction J can be written as:

$\begin{matrix}{J = {\frac{1}{2}{\sum\limits_{x_{r}}{\sum\limits_{x_{s}}{\sum\limits_{i}{\sum\limits_{p}{{a_{i}^{2}\left( {p,x_{r},x_{s}} \right)}{{\tau_{i}^{2}\left( {p,x_{r},x_{s}} \right)}.}}}}}}}} & (28)\end{matrix}$Then, the adjoint source can be written as:

$\begin{matrix}\begin{matrix}{{\delta\;{s\left( {t,x_{r},x_{s}} \right)}} = {\frac{\partial J}{\partial\tau} = {\frac{\partial\overset{\sim}{p}}{\partial p}\left\lbrack \frac{\partial J}{\partial\overset{\sim}{p}} \right\rbrack}}} \\{{= {\mathcal{R}^{T}\left\lbrack {\sum\limits_{i}{\frac{\partial{\tau_{i}\left( {p,x_{r},x_{s}} \right)}}{\partial\overset{\sim}{p}}{\tau_{i}\left( {p,x_{r},x_{s}} \right)}{a_{i}^{2}\left( {p,x_{r},x_{s}} \right)}}} \right\rbrack}},}\end{matrix} & (29)\end{matrix}$where

^(T) is the adjoint Radon transform defined by equation (21). The term

$\begin{matrix}\frac{\partial{\tau_{i}\left( {p,x_{r},x_{s}} \right)}}{\partial\overset{\sim}{p}} & \;\end{matrix}$can be evaluated similar to the corresponding term in equation (11),which leads to:

$\begin{matrix}{{\frac{\partial{\tau_{i}\left( {p,x_{r},x_{s}} \right)}}{\partial\overset{\sim}{p}} = {{- \frac{1}{d^{\prime}\left( {{t - \tau_{i}},p,x_{r},x_{s}} \right)}} = {- \frac{d^{\prime}\left( {{t - \tau_{i}},p,x_{r},x_{s}} \right)}{{{d\left( {{t - \tau_{i}},p,x_{r},x_{s}} \right)}}_{2}^{2}}}}}.} & (30)\end{matrix}$Having the adjoint source given by equation (29), the steps forcalculating the new velocity model are similar to those described in themethod of FIGS. 2A and 2B.

In other words, according to this embodiment, which is illustrated inFIGS. 5A and 5B, the FWI process includes a step 500 in which a set ofmeasured seismic data d is received. The set can be land data, marinedata, log data, etc. In step 502, a starting velocity model v isselected. This initial or starting model may be determined from otherseismic data, log data, or may simply be a guess. Using the startingvelocity model v, the predicted data p is calculated in step 504 using awave equation operator L (as in equation (1)) and the initial velocitymodel v. In step 506, both the measured seismic data d and the predictedseismic data p are transformed from the time-space domain to the tau-pdomain using a Radon transform. A matching filter w is selected in step508 directly in the Radon transform, and a misfit function J is selectedin step 510, also in the Radon transform. The misfit function J is basedon the matching filter w and on a penalty function P. As previouslydiscussed, the penalty function penalizes the matching filter when thematching filter deviates from a Delta Dirac function due to a mismatchbetween the measured seismic data and the predicted seismic data. Instep 512, a Rytov approximation is applied to the matching filter w inthe Radon domain. In step 514, the adjoint source is expressed based onthe misfit function (see equations (29) and (30)), in step 516 theadjoint state wavefield u_(R) is calculated, for example, based onequation (1), and in step 518 a gradient of the wave field velocityoperator L with regard to the model parameter m is calculated, forexample. Note that all these steps are calculated in the Radontransform. Based on the calculated gradient g, a global minimum ormaximum of the misfit function J is calculated, which produces in step520 the new velocity model v_(new). This process may be repeated a fewtimes, if necessary, until a predetermined condition is met, forexample, misfit function values is below a given threshold, or thecalculations are repeated a predetermined number of times. In step 522,based on the new velocity model, an image of the subsurface may becalculated.

According to yet another embodiment, a new misfit function is nowdiscussed. The new misfit function may be obtained by applying theoptimal transport theory to the matching filter. A review of theconventional optimal transport theory is believed to be order for abetter understanding of the new misfit function.

Suppose that there are two sets of data, the observed (or measured) setof data d(t, x_(r), x_(s)), which is collected with receivers as shownin FIG. 1 , and a predicted set of data p(t, x_(r), x_(s)), which iscalculated as discussed above, for example, with regard to equation (1).The optimal transport theory indicates that a Wasserstein distance canbe used to define a misfit function as follows:j=min_(T) −∫|t−T(t)|² p(t)dt,  (31)where T is the transport plan, which maps the mass of p into the mass ofd, where p and d are considered to be distributions. Note that theoptimal transport theory finds the map T for two given densities of massd and p, with an integral of p equal to an integral of d equal to 1. Forsimplicity, the x_(r) and x_(s) coordinates of the receivers and sourcesare omitted herein. For a 1D problem, an explicit formula exists for theWasserstein distance, i.e.,J=∫|t−D ⁻¹(P(t))|² p(t)dt,  (32)where D⁻¹ is the inverse function of D, and D and P are the cumulativedistribution functions given by:D(t)=∫₀ ^(t) d(t′)dt′,  (33)P(t)=∫₀ ^(t) p(t′)dt′,  (34)with t′ being an integration variable.

The optimal transport theory requires the functions d and p to bedistributions, i.e., d(t)≥0, p(t)≥0 and ∫d(t)dt=∫p(t)dt=1. However, theseismic data d or p are oscillatory in nature and they have a zero mean.Thus, the seismic data d and p do not fulfil the requirements of adistribution. To fix this problem, it is possible to precondition theseismic data to fulfill this requirement. The preconditioning mayinvolve adding a large value to each trace to make the entire tracenon-negative, followed up by a normalization to make the summation (orintegration) of the traces equal to 1.

However, the conventional optimal transport theory that uses thepreconditioning step noted above modifies not only the amplitude butalso the phase of the predicted and measured seismic data and this makesthe inversion process unstable and inaccurate, which is not desired fora robust FWI process.

Thus, according to this embodiment, a Wasserstein distance is appliedbetween the matching filter and the Dirac Delta function, instead ofbeing applied to the p and d seismic data. Similar to equation (3), theobserved data d(t) and the computed data p(t) can be matched to eachother with a matching filter w(t) as follows:d(t)*w(t)=p(t).  (35)

To fulfill the requirement of the optimal transport theory discussedabove, the matching filter w is preconditioned and modified to be adistribution. Note that this approach does not precondition the observedand predicted seismic data. In this embodiment, the matching filter issquared and normalized as follows:

$\begin{matrix}{{w^{\prime}(t)} = {\frac{w^{2}(t)}{\int{{w^{2}(t)}dt}} = {\frac{w^{2}}{{w}_{2}^{2}}.}}} & (36)\end{matrix}$

However, one skilled in the art would understand that there are otherways to condition the matching filter to fulfill the requirements of theoptimal transport theory. When the model parameters are accurate, theresulting matching filter reduces to a “Dirac Delta function,” i.e., thetarget of this approach is to have the matching filter be δ(t). Usingthe theory of optimal transport, the misfit function J given by equation(32) becomes:

$\begin{matrix}\begin{matrix}{{J\left( {w^{\prime},{\delta(t)}} \right)} = {\int{{{t - {\Delta^{- 1}\left( {W^{\prime}(t)} \right)}}}^{2}{w^{\prime}(t)}{dt}}}} \\{{= \frac{\int{{{t - {\Delta^{- 1}\left( {W^{\prime}(t)} \right)}}}^{2}{w^{2}(t)}dt}}{{w}_{2}^{2}}},}\end{matrix} & (37)\end{matrix}$where the Laplace operator Δ and W′(t) are used to indicate thecommutative distribution function for the Dirac Delta function δ²(t) andthe normalized matching filter w′(t), respectively. Thus, Δ⁻¹ is theinverse function for the commutative distribution function Δ. It is alsopossible to use the 2D or 3D optimal transport theory over moredimensions.

A method for applying this new misfit function J is now discussed withregard to FIG. 6 . In step 600, the measured seismic data d is received.In step 602, a velocity model is selected and in step 604, the predictedseismic data p is calculated using, for example, the wave equationoperator L. In step 606, a matching filter w is selected to match themeasured data d to the predicted data p. The matching filter isconditioned in step 608 to fulfill the optimal transport process, i.e.,the matching filter should be a distribution having all values equal toor larger than zero and a time integral of the entire distributionshould be equal to 1. Thus, in step 608, the matching filter w issquared and normalized based on equation (36). In step 610, the misfitfunction is expressed in terms of the conditioned matching filter andfrom here, in step 612, the steps 210 to 222 of the method illustratedin FIGS. 2A and 2B are performed with the matching filter shown inequation (36) and the misfit function shown in equation (37).Alternatively, step 612 may include the steps 512 to 522 of the methodshown in FIGS. 5A and 5B. These steps are performed for generating a newvelocity model and an image of the subsurface.

The various embodiments discussed above have their advantages over theknown art. For example, with regard to the embodiment illustrated inFIGS. 5A and 5B, a Radon transform was used to transform the seismicdataset from the time-space domain to the tau-p domain and a matchingfilter was computed in the tau-p domain for the observed and predicteddata, so that the matching filter was indexed by the same p value. Asthe cross-talk events are easier separated in the tau-p domain, thecomputation of the matching filter in tau-p domain is more robust, andthus, leads to a more stable and accurate FWI inversion.

In this regard, note that the Radon transform described in thisapplication is a linear version, which in geophysics is called slantstack or tau-p transform. However, other forms of the Radon transformmay used in the above embodiments, like the hyperbolic and/or parabolicRadon transform.

A high-resolution sparse Random transform (Trad et al., 2002) can beadopted and it will improve the resolution of the tau-p domain dataset.It will be beneficial to use a high-resolution sparse Random transformfor a high resolution FWI process aiming at inverting the subsurfacewith more details.

In one application, the Radon transform can be implemented in the timeor the frequency domain. Muting and filtering the seismic data can besimultaneously applied during the Radon transform process and this wouldfurther improve the signal-to-noise ratio (SNR) of the resultingcomputed matching filter and consequently improve the FWI inversionresult.

In one application, the Radon transform was applied to a common shotgather during the inversion. However, the same methodology can work oncommon receiver gathers, common middle point gathers, common azimuthgathers, etc. This is due to the fact that in those reasonably groupedgathers, the seismic event would always show coherent features anddepends on the different type of seismic acquisitions, some of themwould be better than others in terms of SNR.

Although the previous embodiments focus on improving the methodology ofFWI using a matching filter, one skilled in the art would understandthat these embodiments have wider applications. The Radon transform canbe implemented to other misfit functions: e.g., the normalized globalcross-correlation, the optimal transport misfit function. In general, amisfit function tries to compare two traces from the predicted andobserved data. The Radon transform would at first transform the datasetinto the tau-p domain, and then will design a misfit function. In otherwords, the conventional misfit function is designed to compare thetraces that reside in the physical t-x space, where the original signalswere recorded. One or more of the embodiments discussed above use theRadon transform to transform the data into the tau-p domain, where thesignals are well separated and this leads to a more robust misfitfunction design for the FWI process.

According to still another embodiment, it is possible to formulate themisfit function as a least square error of the travel time shift(associated with the measured seismic data) computed by across-correlation analysis of a penalty function with the matchingfilter. This approach extends the misfit function to a more generalizedform in terms of moments of the resulting matching filter distribution.Specifically, the misfit function of this embodiment can be consideredas a (squared) first order moment of the matching filter. The previousembodiments can be considered as a special case of the second-ordermoment, i.e., the variance. Higher order moments such as skewness (thirdorder) or kurtosis (fourth order) can also be used as the misfitfunction. Hybrid misfit functions by combination of different order ofmoment shows a lot of potential. Thus, in this embodiment, a robustmisfit function by least square inversion of the cross-correlationtravel time shift errors is discussed. This new misfit function isformulated using the least square error of the travel time shift foreach trace, which is given by:

$\begin{matrix}{{J = {\sum\limits_{x_{s}}{\sum\limits_{x_{r}}{\frac{1}{2}\Delta{\tau^{2}\left( {x_{s},x_{r}} \right)}}}}},} & (39)\end{matrix}$where x_(s) and x_(r) are the sources and receiver positions for thetrace, and Δτ(x_(s), x_(r)) are the travel time shift computed for eachtrace. To simplify the derivation of the misfit function J, in thefollowing, the summations over sources and receivers are omitted. Thus,equation (39) becomes:J=½Δτ²,  (40)

To compute the travel time shift Δτ, first a matching filter w(t) iscomputed. The matching filter matches the measured data d(t) with thepredicted data p(t) as follows:d(t)*w(t)=p(t),  (41)where * denotes the convolution operation. Equation (41) is a linearequation, and the matching filter w(t) can be computed either in thetime domain or in the frequency domain, as follows:

$\begin{matrix}{{{w(t)} = {\mathcal{F}^{-}\left\lbrack \frac{{\mathcal{F}\left\lbrack {p(t)} \right\rbrack}\overset{\_}{\mathcal{F}\left\lbrack {d(t)} \right\rbrack}}{{{\mathcal{F}\left\lbrack {d(t)} \right\rbrack}\overset{\_}{\mathcal{F}\left\lbrack {d(t)} \right\rbrack}} + \epsilon} \right\rbrack}},} & (42)\end{matrix}$where

and

⁻ denote the Fourier transform and its inverse, respectively, theoverline sign indicates the complex conjugate operation, and e is asmall positive number to avoid dividing over zero.

If the velocity model for generating the predicted data p(t) is correct,the resulting matching filter would be a Dirac Delta function.Otherwise, its non-zero coefficients would spread over non-zero lags. Across-correlation function ƒ(τ) is introduced to measure such defocusingof the matching filter, and is given by:ƒ(τ)=½∫(t+τ)² w ²(t)dt,  (43)where the time shift Δτ defined in equation (40) corresponds to theminimum value of ƒ(τ). If the velocity model is accurate, the matchingfilter w(t) would be a Dirac Delta function, and the minimum value wouldcorrespond to the time shift Δτ=0. If the velocity model is notaccurate, the time shift Δτ is non-zero and by minimizing its squareerrors, it is possible to update the velocity model.

As the travel time shift Δτ corresponds to the minimum value of ƒ(τ),its first order derivative at the corresponding travel time shift wouldbe zero, and thus, the following connective function is obtained:

$\begin{matrix}{{\frac{d{f(\tau)}}{d\tau}\left( {{{when}\mspace{14mu}\tau} = {\Delta\tau}} \right)} = {{\int{\left( {t + {\Delta\tau}} \right){w^{2}(t)}dt}} = 0.}} & (44)\end{matrix}$

Using equation (44), the travel time shift Δτ can be expressed as:

$\begin{matrix}{{\Delta\tau} = {{- \frac{\int{t{w^{2}(t)}dt}}{\int{{w(t)}^{2}dt}}} = {- {\frac{\int{t{w^{2}(t)}dt}}{{w}_{2}^{2}}.}}}} & (45)\end{matrix}$

Substituting equation (45) in equation (40), the final formula for theproposed misfit function J is obtained as:

$\begin{matrix}{J = {{\frac{1}{2}\left\lbrack \frac{\int{t{w^{2}(t)}dt}}{{w}_{2}^{2}} \right\rbrack}^{2}.}} & (46)\end{matrix}$

To update the velocity model, it is necessary to compute the adjointsource, which is the derivative of the misfit function with respect tothe predicted data, which is given by:

$\begin{matrix}{{{{\delta s} = {\frac{\partial J}{\partial p} = \frac{\partial w}{\partial p}}}\frac{\partial J}{\partial w}},} & (47)\end{matrix}$where

$\frac{\partial J}{\partial w}$is the derivative of the misfit function of equation (46) with respectto the matching filter w(t), which can be expressed as:

$\begin{matrix}{{\frac{\partial J}{\partial w} = {- \frac{2\left( {{t\Delta\tau} + {\Delta\tau^{2}}} \right){w(t)}}{{w}_{2}^{2}}}}.} & (48)\end{matrix}$

The term

$\frac{\partial w}{\partial p}$in equation (47) is the derivative of the matching filter with respectto the predicated data, and it can be computed by the adjoint analysisof equation (43) as follows:

$\begin{matrix}{\frac{\partial w}{\partial p} = {\mathcal{F}^{-}{{diag}\left\lbrack \frac{\mathcal{F}\left\lbrack {d(t)} \right\rbrack}{{{\mathcal{F}\left\lbrack {d(t)} \right\rbrack}\overset{\_}{\mathcal{F}\left\lbrack {d(t)} \right\rbrack}} + \epsilon} \right\rbrack}{\mathcal{F}.}}} & (49)\end{matrix}$

As long as the adjoint of the source is calculated, the next steps forthis embodiment are similar to those of the previous embodiments, andthus, they are not repeated herein. It is possible to back propagate theadjoint source, compute the gradient, apply a linear search, and updatethe model.

According to an embodiment, the misfit function J of equation (46) canbe extended to the following general form:J _(n)=μ_(n) ^(mod(2,n)==0?1:2) =[∫t ^(n) w′(t)dt]^(mod(2,n)==0?1:2),  (50)where mod(2, n)==0?1:2 indicates that if n is odd, the result should be1, or otherwise the result is 2. The function w′(t) is a distributioncomputed by a precondition of the matching filter w(t). The term μ_(n)is the n-th order moment of distribution, which is given by:μ_(n) =∫t ^(n) w′(t)dt.  (51)

Using equation (50), the proposed misfit function J of equation (46) isnothing else than the square of the first order moment (the mean)corresponding to n=1 with the precondition defined by:

$\begin{matrix}{{w^{\prime}(t)} = {\frac{w(t)}{{w}_{2}^{2}}.}} & (52)\end{matrix}$

In one application, another precondition may be selected, which mayeffect the performance of the resulting misfit function.

Different misfit functions may be constructed by using different nvalues. For example, the misfit function of the adaptive waveforminversion (see Warner and Guasch, 2016) can be considered with a penaltyterm selected to be t², and this corresponds to the generalized misfitfunction of equation (50) with n=2, as follows:

$\begin{matrix}{{J_{2} = \frac{\int{t^{2}{w(t)}^{2}dt}}{{w}_{2}^{2}}}.} & (53)\end{matrix}$

Thus, the corresponding μ₂ for the AWI is the second order moment, i.e.,the variance of the matching filter distribution. Higher order momentssuch as the skewness (n=3) or the kurtosis (n=4) may also be useful fora robust misfit function.

In one application, it is possible to use a hybrid misfit function,which is defined by combining more than one of the moments in the misfitfunction, e.g.,J _(1,2)=μ₁ ²+λμ₂,  (54)where λ is a weighting factor.

The above-discussed procedures and methods may be implemented in acomputing device as illustrated in FIG. 7 . Hardware, firmware, softwareor a combination thereof may be used to perform the various steps andoperations described herein. Computing device 700 of FIG. 7 is anexemplary computing structure that may be used in connection with such asystem.

Exemplary computing device 700 suitable for performing the activitiesdescribed in the above embodiments may include a server 701. Such aserver 701 may include a central processor (CPU) 702 coupled to a randomaccess memory (RAM) 704 and to a read-only memory (ROM) 706. ROM 706 mayalso be other types of storage media to store programs, such asprogrammable ROM (PROM), erasable PROM (EPROM), etc. Processor 702 maycommunicate with other internal and external components throughinput/output (I/O) circuitry 708 and bussing 710 to provide controlsignals and the like. Processor 702 carries out a variety of functionsas are known in the art, as dictated by software and/or firmwareinstructions.

Server 701 may also include one or more data storage devices, includinghard drives 712, CD-ROM drives 714 and other hardware capable of readingand/or storing information, such as DVD, etc. In one embodiment,software for carrying out the above-discussed steps may be stored anddistributed on a CD-ROM or DVD 716, a USB storage device 718 or otherform of media capable of portably storing information. These storagemedia may be inserted into, and read by, devices such as CD-ROM drive714, disk drive 712, etc. Server 701 may be coupled to a display 720,which may be any type of known display or presentation screen, such asLCD, plasma display, cathode ray tube (CRT), etc. A user input interface722 is provided, including one or more user interface mechanisms such asa mouse, keyboard, microphone, touchpad, touch screen, voice-recognitionsystem, etc.

Server 701 may be coupled to other devices, such as sources, detectors,etc. The server may be part of a larger network configuration as in aglobal area network (GAN) such as the Internet 728, which allowsultimate connection to various landline and/or mobile computing devices.

The disclosed embodiments provide methods and devices that generate amisfit function based on a matching filter and the misfit functionintroduces a penalty for the matching filter and not for the seismicdata. It should be understood that this description is not intended tolimit the invention. On the contrary, the embodiments are intended tocover alternatives, modifications and equivalents, which are included inthe spirit and scope of the invention as defined by the appended claims.Further, in the detailed description of the embodiments, numerousspecific details are set forth in order to provide a comprehensiveunderstanding of the claimed invention. However, one skilled in the artwould understand that various embodiments may be practiced without suchspecific details.

Although the features and elements of the present embodiments aredescribed in the embodiments in particular combinations, each feature orelement can be used alone without the other features and elements of theembodiments or in various combinations with or without other featuresand elements disclosed herein.

This written description uses examples of the subject matter disclosedto enable any person skilled in the art to practice the same, includingmaking and using any devices or systems and performing any incorporatedmethods. The patentable scope of the subject matter is defined by theclaims, and may include other examples that occur to those skilled inthe art. Such other examples are intended to be within the scope of theclaims.

REFERENCES

-   Debens, H. A., F. Mancini, M. Warner, and L. Guasch, 2017,    Full-bandwidth adaptive waveform inversion at the reservoir: SEG    Technical Program Expanded Abstracts 2017, 1378-1382.-   Huang, G., R. Nammour, and W. Symes, 2017, Full-waveform inversion    via source-receiver extension: Geophysics, 82, R153-R171.-   Luo, S., and P. Sava, 2011, A deconvolution based objective function    for wave equation inversion: SEG Technical Program Expanded    Abstracts 2011, 2788-2792.-   Metivier, L., R. Brossier, Q. Merigot, E. Oudet, and J. Virieux,    2016, Measuring the misfit between seismograms using an optimal    transport distance: application to full waveform inversion:    Geophysical Journal International, 205, 345-377.

What is claimed is:
 1. A method for calculating a velocity model for asubsurface of the earth, the method comprising: receiving measuredseismic data d; calculating predicted seismic data p; selecting amatching filter w such that a convolution operation between the matchingfilter w and one of the measured seismic data d or the predicted seismicdata p is equal to the other one of the measured seismic data d or thepredicted seismic data p; selecting a misfit function J that calculates(1) a distance between the matching filter w and a Dirac Delta functionor (2) a travel time shift associated with the measured seismic data;calculating a new velocity model using the misfit function J, themeasured seismic data d, and the predicted seismic data p; andgenerating an image of the subsurface of the earth, based on the newvelocity model, for identifying resources in the subsurface, wherein themeasured seismic data d includes wavefields generated by a seismicsource and the wavefields propagate through the subsurface where theyare attenuated and reflected, and the attenuated and reflectedwavefields are recorded by plural seismic receivers.
 2. The method ofclaim 1, wherein the step of selecting the misfit function J comprises:defining the misfit function J as a product of the matching filter w anda penalty function P that varies with a time t.
 3. The method of claim2, wherein the penalty function P is equal to an absolute value of timet.
 4. The method of claim 2, further comprising: applying a Rytovapproximation to the matching filter w.
 5. The method of claim 4,wherein the Rytov approximation rewrites the matching filter w as a sumof shifted weighted Dirac Delta functions.
 6. The method of claim 5,further comprising: calculating an adjoint source s, which is associatedwith a mismatch between the measured seismic data d and the predictedseismic data p, using the misfit function J.
 7. The method of claim 6,further comprising: calculating an adjoint state wavefield u_(P) usingthe adjoint source s and a wave equation operator L; and calculating agradient g of the wave equation operator L.
 8. The method of claim 7,further comprising: calculating the new velocity model using thegradient g.
 9. The method of claim 1, further comprising: conditioningthe matching filter w to be a distribution whose integral over time isunity and each value of the distribution is equal to or larger thanzero.
 10. A computing unit for calculating a velocity model for asubsurface of the earth, the computing unit comprising: an interface forreceiving measured seismic data d; and a processor connected to theinterface and configured to: calculate predicted seismic data p; selecta matching filter w such that a convolution operation between thematching filter w and the measured seismic data is equal to thepredicted seismic data p; select a misfit function J that that isindicative of (1) a distance between the matching filter w and a DiracDelta function or (2) a travel time shift associated with the measuredseismic data; calculate a new velocity model using the misfit functionJ, the measured seismic data d, and the predicted seismic data p; andgenerating an image of the subsurface of the earth, based on the newvelocity model, for identifying resources in the subsurface, wherein themeasured seismic data d includes wavefields generated by a seismicsource and the wavefields propagate through the subsurface where theyare attenuated and reflected, and the attenuated and reflectedwavefields are recorded by plural seismic receivers.
 11. The computingunit of claim 10, wherein the processor is further configured to:calculate the misfit function J as a product of the matching filter wand a penalty function P that varies with a time t.